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Adaptation in the retina is thought to optimize the encoding of natural hght signals into sequences 
of spikes sent to the brain. However, adaptation also entails computational costs: adaptive code 
is intrinsically ambiguous, because output symbols cannot be trivially mapped back to the stimuli 
without the knowledge of the adaptive state of the encoding neuron. It is thus important to learn 
which statistical changes in the input do, and which do not, invoke adaptive responses, and ask 
about the reasons for potential limits to adaptation. We measured the ganglion cell responses in 
the tiger salamander retina to controlled changes in the second (contrast), third (skew) and fourth 
(kurtosis) moments of the light intensity distribution of spatially uniform temporally independent 
stimuli. The skew and kurtosis of the stimuli were chosen to cover the range observed in natural 
scenes. We quantified adaptation in ganglion cells by studying two-dimensional linear-nonlinear 
models that capture well the retinal encoding properties across all stimuli. We found that the 
retinal ganglion cells adapt to contrast, but exhibit remarkably invariant behavior to changes in 
higher-order statistics. Finally, by theoretically analyzing optimal coding in LN-type models, we 
showed that the neural code can maintain a high information rate without dynamic adaptation 
despite changes in stimulus skew and kurtosis. 



I. INTRODUCTION 



Adaptation is ubiquitous in the nervous system, from 



synaptic depres sion (Abbott et al. , 19971 Tsodyks fc 



Markram, 1997) and single neuron spiking ( Adrian[ 1928| 



Partridge fc Stevens] 



ules 



e.g. 



Miiller et a 



1976), to the activity of neural mod- 



(1999)). In sensory systems, it has 
been suggested to be a ke y design principle of the neural 
code (Wark et al. 2007), which may allow for optimal 
information coding by matching the neural responses to 
stimulus statist i cs (|Atick| |1992[ |Atick fc Redlich[ |1990 



[Attneavel |1954| [Barlowl |196Tp The retina is one of the 



most studied highly adaptive neural circuits, in which the 
mapping between stimuli and neural response changes to 
match the statistics of the mean light intensity (Sh ap^ 



ley fc Enroth-Cugell 1984), temporal and spatial con- 



trast and spatial scale (Beaudoin et al. 



fc Chichilnisky , 2001 Smirnakis et al. 



(Hosoya et al. , 2005), relative motion (Olveczky et al. 



2007 



1997 



Chander 



pattern 



2007D and p"eriodicity (Schwartz fc Berry[ |200"8 r 



Since adaptation requires some form of memory and 
inference of the stimulus statistics to which the sys- 
tem should adapt, the mechanism and nature of adap- 
tation have been studied extensively. For example, the 
dyna mic structure of the reti nal ganglion cell receptive 
fields (Srinivasan et al, 1982| , and contrast adaptation 



in the vertebrate and fly visual systems (Brenner et 



* gt kacik@ist . ac . at 



al.| 20001 IChander fc Chichilnisky| 2001| 


LaughlinI 11981 


Shapley fc Victor | 19791 Smirnakis et a 


.[ |1997| Victor 



1986) have been characterized as gain-control mecha- 
nisms that serve to efliciently encode the variation of the 
stimulus around the mean into a limited dynamic range 
of flring rates at the output. It has been further shown 
that neural systems adapt not only to various stationary 
stimuli, but also to dynamic changes in stimul us distri- 
butions taking place across multiple timescales ( Fairhall 



et al. 2001 Ulanovsky et al. , 2004 Wallach et al., 2008). 



Despite its ubiquitous presence, it is still not clear what 
are the limits to adaptation, and in particular, which 
stimulus changes should lead to adaptive responses and 
which should not. Moreover, by its nature, adaptation 



comes with an inherent caveat or cost (Fairhall et al. 



2001): adaptive code is ambiguous, which requires the 
decoder to keep some knowledge of the coding context. 
Since most studies of adaptation analyzed neural sys- 
tems' response to flrst- and second-order spatio-temporal 
statistics in the stimulus, we addressed here the nature 
of neural response to changes in higher-order structure 
of visual stimuli; such hi gher-order structure is charac- 
teristic o f natu ral scenes ( Geisler[ 2008 Simoncelli fc 01- 
shausen] 2001) and is perceptually salient QChubb et al. 
20041 IPortilla fc Simoncellil 120001 ITkacik et all |201ip. 



To characterize adaptation and invariance to stimu- 
lus statistics beyond luminance and contrast, we stud- 
ied retinal responses to spatially uniform stimuli where 
light intensities were drawn independently from distri- 
butions with tunable amounts of skewness and kurtosis. 
We analyzed salamander retinal ganglion cell responses 



2 



using maximally informative dimensions, and quantified 
the encoding properties of the neurons using 2D linear- 
nonlinear (LN) models. We found clear signatures of con- 
trast adaptation in the rescaling of nonlinearities to these 
stimuli, but also that the same neurons display striking 
encoding invariance to changes in stimulus skewness or 
kurtosis, even when they are exposed to strongly bimodal 
stimulus distributions. 



II. MATERIALS AND METHODS 

Natural image statistics. To sample the range of naturally 
occurring values for contrast, skewness and kurtosis, we took a 
sample of 501 calibrated grayscale images from the Penn Natural 
Image Database (PNIDb) ( [Tkacik et al]|2011[ ). From each image 
we selected 400 random patches 200 X 200 pixels in size, which cor- 
responds in area to the angular size of about 3 degrees, roughly 
the size of the center receptive field of a salamander retinal gan- 
glion cell. Averaging over each patch to get the mean luminance 
in that patch, we computed the contrast, skewness and kurtosis of 
the distribution of patch luminances in a given image. Repeating 
the process over all images in our selection (containing shots of Ba- 
boon habitat in Okavango delta in Botswana, including landscape 
images, some with horizon, closeups of the ground, and a small 
selection of man-made objects in that habitat), we accumulated 
natural distributions of contrast, skewness and kurtosis. 

In our analyses, contrast is defined as C = (Tl/L^ where L 
is the mean luminance, cfl = sj {{L — L)^} is the std of the 
luminance distribution P{L) and brackets denote averaging over 
this distribution; skewness S = {(L — L)^)/a^] and kurtosis 
K = {(L — L)^)/(jf^ — 3. Note that kurtosis is defined to be for 
Gaussian distributions. 

Electrophysiology. Multi-electrode array recordings were 
performed on adult tiger salamander (Ambystoma tigrinum) 
(Meister e t al.[ [lj94| . All experiments were done according the 
regulation of Ben Gurion University of the Negev and the laws 
of the State of Israel. Prior to the experiment the salamander 
was adapted to bright light for 30 minutes. Retinas were isolated 
from the eye and peeled from the sclera together with the pigment 
epithelium. Retinas were placed with the ganglion cell layer facing 
a multielectrode array with 252 electrodes (Ayanda Biosystems, 
Switzerland) and superfused with oxygenated (95% O2, 5% CO2) 
Ringer medium which contained llOmM NaCl, 22mM NaHCOs, 
2.5mM KCl, ImM CaCb, 1.6mM MgCb, and 18mM glucose, at 
room temperature. The electrode diameter was 10/im and elec- 
trode spacing varied between 40 and 80 /im. Recordings of 24-30 
hours were achieved consistently. Extracellularly recorded signals 
were amplified (MultiChannel Systems, Germany), digitized at 
10 kHz on four personal computers and stored for off-line spike 
sorting and analysis. Spike sorting was done by extracting from 
each potential waveform the amplitude and width, followed by 
manual clustering using an in-house program written in MATLAB. 

Stimulation. The stimulus was projected onto the salaman- 
der retina from a CRT video monitor (ViewSonic G90fB) at a 
frame rate of 60Hz such that each stimulus frame was presented 
twice in a row (for a stimulus sampling rate of 30Hz) using stan- 
dard optics. The stimulus intensity was presented in grayscale and 
was gamma corrected for the monitor. Gaussian stimulus distri- 
butions with the desired variance were generated using MATLAB 
random number generator. We refer to all non-Gaussian stimuli 
as HOS (higher-order statistics) stimuli, which we generated with 
the statistics given in Table [l] as follows. The kurtotic distribu- 
tions were sums of two equal-variance Gaussian distributions of 
equal weight and displaced means. The skewed stimuli are sums of 
two Gaussian distributions with unequal variances. All resulting 



distributions have the same mean luminance of L 225 lux. 

We performed two experiments. In the first (23 cells), all 9 
stimuli were displayed in long, non-repeated sequences (52202 
frames of 33.33 ms each for each of the 9 stimuli), allowing us to 
infer LN models precisely; we used the Gaussian stimulus to fit LN 
models using both the spike-triggered average / covariance and 
maximally informative dimensions, to check how closely the two 
inference methods agree. In the second experiment (40 cells), the 
two Gaussian stimuli were absent, while for the 7 remaining HOS 
stimuli each non-repeated sequence was followed by a repeated 
sequence (30 repeats, 602 frames at 33.33 ms for each repeat), 
used to validate our models. 



stimulus S 


symbol 


contrast C 


skewness S 


kurtosis K 


Gaussian 


c+ 


0.097 










C++ 


0.177 








Skewed 


S- 


0.170 


-1.9 


5.1 


(HOS) 


S- 


0.172 


-1.0 


6.0 




s+ 


0.175 


1.0 


5.2 




S++ 


0.178 


1.9 


5.2 


Kurtotic 


K- 


0.177 





-1.8 


(HOS) 


K- 


0.176 





-0.9 




K+ 


0.173 





5.3 



TABLE I Stimuli S used in the experiment (see main 
text for the definition of the statistics C^S^K). The 
shorthand symbol for the stimulus starts with the C/S/K (for 
contrast, skew, kurtosis) and is followed by -,— ,+,++ (small 
magnitude and negative, large magnitude and negative, small 
magnitude and positive, large magnitude and positive); there- 
fore, S = {C+,C++,S~,S-,S+,S++,K-,K-,K+}. Parameters 
in the table denoted in bold were varied in each of the three 
stimulus categories. 



Linear filters. In inferring LN encoding models, reverse corre- 
lation techniques cannot directly be applied to non-Gaussian stim- 
uli because they lead to biased filter estimates. Ins tead, we used 
maximally informative dimensions (MID) (Sharpee et al.[ [2004| ). 
To look for a single significant filter ki, one performs the follow- 
ing maximization over possible linear filters ki , constrained to unit 
norm (ki • ki =1): 

4pike = maxk^ Dkl (^(ki • s|spike) | |P(ki • s)) . (1) 

Here Dxl is the Kullback-Leibler divergence jCover &: Thomas[ 
|1991| > between the spike-triggered distribution and the prior distri- 
bution of stimulus fragment projections onto ki, and s are stim- 
ulus fragments (those preceding the spike for the spike-triggered 
distribution, and all fragments for the prior distribution). To look 
for 2D models, we repeated the same optimization with two fil- 
ters {ki,k2}; the spike-triggered and prior distributions are two- 
dimensional in this case. For all neurons, the single most informa- 
tive filter ki was contained in the space of the 2 filters {ki, k2}; for 
further analysis, we rotated the system of reference such that the 
first filter was the single most informative filter ki (which mostly 
corresponded to the spike-triggered average for those cells that were 
exposed to Gaussian stimulus), while the second filter k2 spanned 
the {ki,k2} subspace together with ki, and formed an orthonor- 
mal basis, ki • k2 = 0, k2 • k2 = 1. 

The filters extended over 600 ms and were sampled on 36 
equidistant points, with temporal resolution of 16.67 ms. We ex- 
pressed the filters as a combination of 16 basis functions, ki,2 = 
^1 2^M' where are unit-area Gaussian bumps with 40 ms 
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width, uniformly tiling the 600 ms span of the filters, and ol are the 
expansion coefficients; we maximized /gpike in the space of param- 
eters OL. This expansion made the filters smooth and very slightly 
improved generalization performance, but the results were stable 
even if we inferred directly in the space of k. 

For performance reasons we estimated during MID op- 

timization runs using kernel-smoothing estimation; for final re- 
sults we used the context- weighted-tree (CTW) estimator ( [Sadeghi] 
|2009| >; the two estimators agreed without bias and to within 4% for 
final filters across all cells and stimuli. Optimization was done using 
custom stochastic gradient descent code that can avoid local max- 
ima. We performed two optimization runs for each cell and each 
stimulus, and the values of information per spike between the two 
runs differed by 1% on average, 98.6% of the runs had a difference 
smaller than 5%. 

To quantitatively compare the shapes of the filters across stimuli 
in experiment 1, we needed to ensure that each stimulus condition 
had enough spikes for good filter inference. We required each cell 
to have an average ffiing rate of at least 1.5 Hz; 15 out of 23 cells 
passed this cut. In experiment 1 we displayed Gaussian stimuli 
in addition to HOS stimuli, and we computed spike-triggered 
average / covariance to extract STA and the next most significant 
filter (orthogonal to the STA) from Gaussian segments. To judge 
the significance of the eigenvectors in the STC analysis we used 
bootstrapping with subsets of re corded spikes following jBialek "fc] 
|de Ruyter van Steveninck| ( |2005| >. 

Nonlinearities and PSTH prediction. After having 
reconstructed the linear filters {ki,k2}, we estimated the nonlin- 
earities as follows: M{v\^V2) — r P(i;i , 'U2 1 spike) /P('Ui , -^2), where 
vi^2 = ki^2 • s are the projections of the stimulus onto the two 
linear filters, and f is the mean ffiing rate of the neuron in a given 
stimulus condition. For 2D nonlinearities, we binned {vi^V2} 
values on a 16 X 16 grid; for estimating ID projections of the full 2D 
nonlinearity, we binned into a number of bins that was adaptively 
dependent on the number of spikes, and used kernel smoothing to 
approximate the probability distributions. Prediction performance 
was only slightly changed when 2D nonlinearities were sampled 
over 32 x 32 domain, and in general dropped due to overffiting 
when 64 x 64 bins were used. We used 2D LN models ffi on the 
nonrepeated segment of the stimulus to predict the PSTH for the 
repeated segments, using the time resolution of 16.67 ms, half the 
stimulus refresh rate. The ffi was quantified by computing the 
Pearson cross-correlation between the true and predicted PSTH. 

Information captured by the models. In the framework 
of LN encoding models, one assumes that only a small num- 
ber K of linear projections {-ui, -^2, . . . , vk} of a high-dimensional 
stimulus s determin e whether a neuron spikes or not (Agiiera 
[y Areas &: Fairhall[ |2003| ). In other words, the neuron is 
viewed as implementing a probabilistic dependency chain: s — )► 
{vi,V2, ■ ■ ■ ,vk} J^(yiii^2i • • • i'Vk) ~^ spike, which implies a 
chain of information processing inequalities: /spike(s; spike) > 
4pike({'i^i,'i^2, . . • ,'i;K}; spike) > /spike (A/'('ui, 'U2, . . ., -uk); spike). It 
is possible to estimate /spike(s; spike) from repeated presentations 

of the same stimulus. If r{t) = (^(^ — ^/^)} repeats is the time- 

dependent firing rate, where p = 1, . . . , R indexes the repeats, 
is the time of /i-th spike in repeat p, N(p) is the total number of 
spikes in repeat p, and t G [0, T] denotes time within the repeat of 
length T, the estimate for true information per spike is given by 
( [Brenner et al.||2000| >: 

C.e = |r^*^log.^; (2) 

^ T Jo r r 

here f = 1/T dt r(t) is the average ffiing rate across the repeated 
stimulus segment. This quantity is an upper bound to the infor- 
mation quantities defined above for LN models. The fraction, e.g. 
/spike ({'^15 '^2} I spike) //^^.j^g (between and 1), tells us how well the 
two stimulus projections capture the full dependence of spiking on 



the stimulus. Similarly, /spike(-^('^i: '^^2)! spike) //^^.j^^ (which needs 
to be lower or equal to the information in the two projections for 
the same neuron and stimulus) quantifies how much information is 
further lost when compressing the description of spike-dependence 
from two projections into a single nonlinear combination. 

The information was estimated from Eq ([2J with rates computed 
in 16.67 ms bins (matching the time resolution of the temporal fil- 
ters and the sampling used to calculate D^l in Eq Q), and was 
corrected for small-sample bias by repeatedly estimating the in- 
formation on random subsets of stimulus repeats of varying sizes, 
plotting the information estimates against 1/A^repeats and extrapo- 
lating to infinite number of repeats; we also applied the correction 
for the difference between mean firing rates in the repeated and 
non-repeated stimulus segments ( |Fairhall et al.[ |2006| . The ex- 
pected extrapolation error is below 1%. To obtain an upper bound 
for the systematic error due to short repeat length, we split the 
repeated segment in half and estimated the information separately 
on each half, which resulted in relative differences with a std of 
8%; we expect the true error to be smaller. Information rates were 
estimated by computing the information per spike and multiplying 
by the mean ffiing rate. 

For all neurons recorded in experiment 2 (with non-repeated 
and repeated stimuli), we computed several information- 
theoretic quantities: (i+ii) /spike({'^i , '^^2}! spike, <S)//^p^.j^^(<S) 
is the information fraction captured by 2 (and 1, respec- 
tively) linear filter (s), fit separately to each stimulus condition 
S; (iii+iv) Ispike{J^{vi,V2)\spike,S)/I^^.^^(S) are the frac- 
tions captured by the nonlinear combination of 2 (and 1, 
respectively) projection(s) ffi separately to each stimulus con- 
dition S; (v+vi) /spike({'i^i,'i^2}|spike,global)//g^p^.j^^(*S) and 
/spike (-^(i;!, '^2) I spike, global) //^p^.j^g(«S) are the fractions captured 
by a single 2D model (by two projections and their nonlinear 
combination, respectively) that has been ffi globally to all 
stimulus stimuli S. These quantities were all estimated using 
CTW estimator for Kullback-Leibler divergence. When esti- 
mated on spike trains that have been shuffled with respect to 
the stimulus, the estimator yields negligible values below 10~^ bits. 

Predicted information rate of LN models. To computa- 
tionally simulate the effect that the (lack of) adaptation would have 
on the information rate when the stimulus statistics changes, we 
used the LN models inferred at high contrast C++ to predict the 
ffiing rate r(t) for stimuli with contrasts C < C++ and ask how 
much information such neurons would carry per spike in the ab- 
sence of any adaptation. This information in the predicted rate, 
/rate, was evaluated using Eq (|2} and expressed as a fraction of 
information captured by the two relevant filters (which does not 
depend on contrast and only serves as a normalization). We simi- 
larly asked how the same quantity would behave when the global 
models (single LN models for all cells that are fit across all stimuli, 
and therefore have no adaptation) were used to encode information 
into the rate for each of the skewed stimuli. 



III. RESULTS 

To characterize how the retina encodes higher-order 
statistics (HOS) of the luminance distribution, we pre- 
sented it with a set of 9 synthetic spatiahy homogenous 
stimuh 5, where the hght intensity of each stimulus frame 
was drawn independently from distributions Ps{L) that 
were matched in mean L (see Materials and Methods). 
The stimuli differed systematically in contrast, skewness, 
and kurtosis, as depicted in Fig[l] The particular values 
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were chosen to span the range of skewness and kurtosis 
which are found in natural scenes, as shown in Fig.[2]A.-E. 
To span these ranges, contrast values C had to be chosen 
in the low range, due to the hardware limitations of the 
stimulus display. 

To quantify how retinal neurons change their code 
when contrast, skewness, or kurtosis of the stimulus 
change, we constructed accurate encoding models for the 
recorded neurons, and compared their properties under 
the d ifferent stimuli. We thus followed Fairhall et al.l 
( |2QQ6 ), who have shown that for spatially uniform Gaus- 
sian stimuli in the salamander retina, linear-nonlinear 
(LN) models with one or two linear filters often suffice 
to describe the cells' encoding scheme with high accu- 
racy. Moreover, Agiiera y Areas fc Fairhall] (2003) and 
Bialek & de Ruyter van Steveninck ( 2005 ) also provided 



an interpretation of the filtering operations as dimen- 
sionality reduction on the stimulus space, the success of 
which can be quantified with information theory. Here we 
extended their framework to non- Gaussian stimuli and 
analyzed how information is encoded beyond linear fil- 
tering stage, in the nonlinear response, and finally in the 
spiking rate. We could then characterize adaptation and 
invariance quantitatively, and compare the behavior of 
real neurons with computational models that either have 
or lack adaptation. 



1. Linear filters of retinal ganglion cells do not adapt to 
changes in higher-order stimulus statistics 

We recorded from 23 retinal ganglion cells that were 
presented with 9 types of non-repeated stimuli (2 Gaus- 
sian + 7 higher-order statistics) in experiment 1, and 
from 40 cells presented with non-repeated and repeated 
stimuli of 7 types with higher-order statistics (see Mate- 
rials and Methods) in experiment 2. Figure [sjA. shows the 
estimated information rate of the neurons as a function of 



their firing rate. Consistently with previous reports ( Bal- 



asubramanian & Sterling 2009), the information rate / 
scaled weakly sub-linearly with the mean firing rate f 
(/ (X f^-^^). There were no other large systematic de- 
pendencies in transmitted information across cells and 
stimulus classes. 

Next, we inferred the best linear filters for each cell, 
and each of the stimulus condition 5, separately. We 
used maximally informative dimensions (MID) for learn- 
ing the filters for all stimuli, and for the Gaussian stimuli 
we additionally used spike-triggered average and spike- 
triggered covariance. We also inferred a global model for 
each cell, where a single filter (or a single pair of filters) 
was fit across all stimulus conditions (see Materials and 
Methods). In cross-validation on test data, the prediction 
performance of the models of essentially all cells (97% of 
cell/stimulus combinations) increased when using two fil- 
ters (2D LN models), compared to one-dimensional LN 
models, but in some cases the contribution of the sec- 
ond filter was very small. The linear filters inferred us- 



ing MID for one of these cells are shown in Fig. [3^ for 
9 stimulus conditions; overlaid is the leading eigenvector 
of the spike-triggered covariance matrix computed for the 
C++ stimulus, and the best global filter learned by MID 
(all filters are scaled to unit norm). The filters show very 
strong overlap, indicating that their shape does not adapt 
to the stimulus distribution. We emphasize that comput- 
ing the naive STA estimates gives a systematic change in 
filter shape with the stimulus skew, as shown in Fig. [3p, 
but this is simply an artifact of the STA estimation on 
non-spherically-symmetric stimuli, and is not indicative 
of any adaptation process. 

Figures |3p)-E show a typical cell for which a model 
with two linear filters is needed. We again observe a 
high overlap between the filters inferred using MID in 9 
stimulus conditions, the filter pair computed using STC 
in the Gaussian condition, and the single global best pair 
of filters inferred across all conditions using MID. 15 of 
the 23 cells in experiment 1 have an average firing rate 
above 1.5 Hz for every stimulus, permitting reliable filter 
estimation. Out of those, a single-filter model in the 
Gaussian condition suffices for 8 cells (i.e. the single-filter 
model accounts for more than 90% of the information 
per spike of the two-filter model), and for 7 cells two 
filters are needed. To measure the agreement between 
inferred filters across conditions for each cell, we compute 
the Pearson cross-correlation between the STC derived 
filter(s) in the Gaussian condition, and the filters derived 
using MID for each stimulus condition S. The average 
correlation across the group of cells with a single linear 
filter is 97% ±2%, while the average correlation across 
the group of cells with 2 filters is 86% ± 5% (error bar = 
std across cells); the decrease in the later case is mainly 
attributable to the difficulty of inferring jointly 2 filters 
using MID with a limited number of spikes. 

We quantified the effects of changing the higher-order 
statistics on the shape of the linear filters by comput- 
ing the balance index 6, defined as the ratio between 
the total (signed) area under the filter and the absolute 
area: h = ^^k^/ where ji indexes the tempo- 

ral components of the filter. The balance index across 
the recorded population was h = —0.0350 ± 0.13 (n=40 
cells from experiment 2), where the mean and error bar 
(std) are taken across all cells, conditions and both filters. 
Broken down across conditions, there is a small system- 
atic modulation of b with the stimulus, which is smaller 
than 10% (Fig[4|; additionally, there is substantial scat- 
ter across the recorded population. 

To ask whether these slight variations in filter shape 
across stimuli matter for encoding, we compared the per- 
formance of global filters (constant for each cell across 
all the stimuli) with filters inferred separately for each 
stimulus. We estimated the information captured by the 
single-filter model, by a two-filter model, and by a two- 
filter global model (where a single pair of two filters is 
inferred for all stimuli for each cell). Single-filter models 
(fit to each stimulus separately) captured 69% ± 12% of 
the information per spike (averaged across cells and stim- 
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FIG. 1 Synthetic stimuli used to probe salamander retinal ganglion cells. The probabihty densities, logP5(^), for 
ah 9 stimuh S used, grouped into 3 categories (cyan = Gaussian, magenta = skewed, and yeUow = kurtotic). Ah stimuh are 
matched in mean (225 lux), and aU except for C+ have the same contrast; for details, see Table [l] 
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FIG. 2 Higher-order statistics in natural scenes. A,B) Two example images from the Penn Natural Image Database. 
The grayscale images are calibrated into units of cd/m^. The yellow circle represents the typical size of the salamander retinal 
ganglion cell center. Luminance was averaged in patches of this size, and contrast (C), skewness (S) and kurtosis (K) were 
computed for the distribution over many patches from each image. The distributions P{L) for the two example images are 
shown as insets, and the corresponding values for C,S,K are displayed in the two image panels. C,D,E) The distribution 
of contrast, skewness and kurtosis, respectively, over 501 natural images. Colored squares represent the values of the three 
parameters used in synthetic stimuli (color coded as in Fig[T]). 2 cyan stimuli differ in contrast C but have constant S and K; 
4 magenta stimuli differ in skew S but have constant values of C and K; and 3 yellow stimuli differ in kurtosis K but have 
constant C and S (see Table |l]). 



uli). Two- filter models (fit to each stimulus separately) 
captured 81%±12%of information per spike, as shown 
in Fig. [3p; for some cells, two filters capture essentially 
all of the information. Our observations were quantita- 
tively consistent with the results reported by |Fairhall et| 
[dj([2006). Most importantly, the global models where fil- 
ters did not change with the stimulus capture 77% ± 13% 



of the information per spike, confirming that the minor 
variations between filters inferred in different stimulus 
conditions are insignificant from the encoding perspec- 
tive. 

Taken together, our results show that the shape of lin- 
ear filters in 2D LN models for salamander retinal gan- 
glion cells does not change noticeably when the skewness 
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FIG. 3 Linear filters and higher-order statistics stim- 
uli in retinal ganglion cells. A) Estimated information 
rate (see Methods) as a function of the mean firing rate. Each 
dot represents one of the 40 cells in experiment 2 exposed to 
one of the 7 HOS conditions (skewed stimuli in magenta, kur- 
totic in yellow). The growth in information is slightly sublin- 
ear, with no obvious systematic dependence on the stimulus 
type. B) A cell whose behavior is captured well by a sin- 
gle linear filter. Shown in light blue are the filters for all 9 
(2 Gaussian, 7 HOS) stimuli reconstructed using maximally 
informative dimensions; in black the spike-triggered average 
computed on the Gaussian stimulus C++; in red, a single 
global filter inferred using MID across all stimulus conditions 
simultaneously. C) Biased STA filter estimates for 5 skewed 
stimuli (thicker lines mean increasing skewness) for the same 
cell as in B (note the difference in the time axis). D,E) A cell 
whose behavior is described well by two linear filters (light 
blue = the most informative dimension; dark blue = the sec- 
ond most-informative dimension). Other symbols the same 
as in B). F) Information captured by two filters (across stim- 
uli, horizontal axis), as a fraction of the total information per 
spike; mean and std across 40 cells in experiment 2. The av- 
erage performance of global models (the same pair of filters 
across all stimuli for each cell) is plotted as red squares. 

and kurtosis vary across the range observed in natural 
scenes. 

2. Nonlinearities of retinal ganglion cells responding to 
higher-order statistics stimuli 

Completing the LN description of the ganglion cells is 
the mapping from the linear projection(s) of the stimulus 
into the cell's firing rate. We estimated these 2D nonlin- 
ear functions from the data by binning P(vi, spike), 
where Vi = • s are the projections of the stimulus 
onto the two filters, ki, k2, as explained in Materials and 
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^(S--) i(S-) L(S+) MS++) ^{K-) A(K-) ^(K+) 

FIG. 4 The dependence of the balance index on the 
stimulus type. The balance index 6 is a ratio between the 
total (signed) area under the filter, normalized by the absolute 
area; balanced filters have 6 = 0, fully unbalanced 5 = ±1. 
Shown are averages (±1 std error bars) for 40 cells in experi- 
ment 2, computed across both linear filters for each cell. 



Methods. This was done for each neuron and each con- 
dition separately, or for all conditions jointly using the 
global pair of filters, to yield a single 2D global LN model 
for every cell. Figure [5}^ shows a global nonlinear func- 
tion for an example neuron. In Fig. [5^ we explicitly 
show, for that same neuron, the prior ensembles for all 7 
higher-order statistics stimuli (gray) , with overlaid spike- 
triggered ensembles for skewed (magenta) and kurtotic 
(yellow) stimuli, along with the marginal projections of 
these distributions. The number of spikes is in general in- 
sufficient to reliably sample and then systematically com- 
pare the 2D nonlinear functions across cells and stimuli. 
We therefore decided to base our comparisons directly on 
the firing rate prediction performance. The 2D models 
for cells in experiment 2 are fit on the non-repeated seg- 
ments, and are subsequently tested by predicting PSTH 
in response to repeated stimulus presentations for which 
we could measure the true PSTH; here, too, the predic- 
tion was done either with models fit separately at each 
condition, or with the global model, where 2 linear fil- 
ters and the nonlinearity were fit simultaneously across 
all stimuli, as shown in Fig. [5p. 

Does the nonlinearity change with the stimulus condi- 
tion? We first estimated how much information is lost 
in compressing the 2D projections {^1,^2} into the non- 
linear combination, A/'(vi,V2), on average. As shown in 
Fig. jsp, the nonlinearity captured 80-85% of the infor- 
mation per spike (fit for each condition separately), es- 
sentially the same amount as the two linear filters (c.f. 
Fig. [3^) . This finding establishes that the nonlinear map- 
ping itself does not discard the information per spike, and 
that analyzing the changes in point- wise nonlinearities is 
warranted. In terms of PSTH prediction, the prediction 
of the ID LN models, fitted to each conditions separately, 
had 64% ± 8% correlation with the real PSTH (error bar 
= std across 40 cells and 7 stimuli). 2D LN models were 
better with 74% ± 9%, as shown in Fig[5p). The global 
models that had constant filters and nonlinearity across 
7 higher-order statistics conditions, performed negligi- 
bly lower, with 72% ± 9% correlation, mostly due to the 
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FIG. 5 Nonlinearities and rate prediction with higher-order statistics stimuli. A) A 2D nonlinearity globally fit 
across all HOS stimuli for neuron E1C2; projection of the first (second) filter shown on the horizontal (vertical) axis. Hotter 
colors indicate increased firing rates (see colorbar, rate in Hz; white = regions of {vi,V2} space where no spike or prior samples 
have been observed). B) For the same cell, the depiction of prior ensemble (gray dots, all 7 higher-order statistics stimuli 
overlaid) and the spike-triggered ensembles (magenta = skewed stimuli, yellow = kurtotic stimuli); shown are also projections 
of the data, i.e. the marginal distributions P{vi) and P{v2), on the logarithmic scale, for all 7 stimuli separately. C) The 
segment of predicted and true firing rate in responses to repeated K- stimulus presentations (red = 2D LN global model fit to 
all stimuli for this neuron; black = true rate). D) Model performance, measured as the Pearson correlation (PC) between the 
true and predicted PSTH, across different stimuli (horizontal axis; average and error bars = mean and std across 40 neurons in 
experiment 2). The performance of 2D LN models fit separately for each stimulus is shown by magenta (skewed stimuli) and 
yellow (kurtotic stimuli) bars. Global model performance (red squares) matches the performance of models fit separately. Right 
axis, in green: information fraction captured by the nonlinear combination of the 2 linear projections, N'{vi, shows no drop 
compared to the information captured by the linear features themselves (c.f. bars in Fig.|3^), and is between 80 — 85% across 
all stimuli (error bars omitted for clarity, comparable to error bars in information fraction captured by the 2 linear features). 



sparse K+ stimulus condition (where the spike rate is the 
lowest); there was essentially no drop for other stimulus 
conditions in performance of global vs. separate mod- 
els. Our simulations show that the limitations to the 
performance of LN-based models in predicting the firing 
rate are due to the inability of these models to capture 
the refractory and spike- feedback effects (which would 
be possible with e.g. generalized linear models (Pillow 
et al.[ |2008D or Keat-type models ( |Keat et al."| | 200ip ). 
We reiterate, however, that the prediction performance 
was used here solely as a differential measure, to eval- 
uate the significance of the changes in linear filters and 
nonlinearities with the stimulus condition. 

Finally, we can ask how successfully the global models 
recapitulate the overall firing rate changes with the stim- 
ulus statistics. Figure |6|^ shows the relative change in 
firing rates for cells from experiment 2 for 7 HOS stim- 
uli; the cells have been sorted to reveal the dominant 
pattern, where cells that prefer left-skewed stimuli re- 
spond less strongly to the other stimuli, while cells that 
respond strongly to right-skewed stimuli also respond to 
negative kurtosis. We can use a global 2D LN model fit 
for every cell to predict the mean firing rate in response 
to all of the 7 stimuli. These models (without any adap- 
tation) reproduced very well the mean firing rate for each 
stimulus and cell, as depicted in Fig. [66, and therefore 



also the pattern of changes in the firing rate (not shown). 



3. Adaptation to changes in contrast and invariance to changes 
in higher-order statistics 

We observe that the encoding properties of salamander 
ganglion cells do not depend on the higher-order statistics 
of the luminance levels. To establish this, we compared 
the shape of linear filters across the stimulus conditions 
directly (by measuring their overlap), by information- 
theoretic measures (information per spike captured), and 
through the impact on the prediction performance; simi- 
larly, we assessed the changes in the nonlinearity by mea- 
suring their impact on the prediction performance. In 
all cases — with the possible exception of highly kurtotic 
(K+) stimulus — we found that global models, i.e. mod- 
els with invariant pair of filters and invariant nonlinear- 
ity, account for the neural behavior equally well as the 
models fit to different stimuli separately. Since neurons 
are nonlinear dynamical systems coupled to the stimu- 
lus, we find this extent of invariance surprising - we ex- 
pected that higher-order statistics in the stimuli would 
"interact" with the photoreceptor-bipolar-RGC cascade 
to yield a notable change in the effective encoding prop- 
erties, especially for heavily skewed or bimodal stimuli. 
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FIG. 6 Measurement and prediction of changes in the 
mean firing rate with stimulus condition. A) The rela- 
tive change (color scale, 1 = the firing rate of the cell is equal 
to the mean rate for that cell across all stimuli) in the mean 
firing rate for 40 cells of experiment 2, as a function of the 
stimulus condition (magenta = 4 skewed, yellow = 3 kurtotic 
stimuli). Neurons (rows) were sorted by the projection on 
the first principal component explaining most of the change 
across the recorded population; cells close to the top increase 
the firing rate in response to left-skewed stimuli, while cells 
at the bottom increase the rate in response to right-skewed 
and negative kurtosis stimuli. B) Global 2D models for each 
cell predict the average rate well (each dot is one cell in one 
of the 7 HOS stimulus conditions). 



We next analyzed the recordings from experiment 1 
where neurons were exposed to high (C++) and low (C+) 
contrast stimuli. Since our analysis kept the filters nor- 
malized to unit length, contrast adaptation would be re- 
flected in the change of the shape of the nonlinearity. 
Indeed, this can be seen in Fig. [7]A. (inset), which shows 
the (marginal) nonlinearity, A/'(^), of a typical neuron 
for the high- and low-contrast experiments. We then 
took the nonlinearity from the low-contrast experiment 
and rescaled it as follows. First, we rescaled the range 
of inputs to the nonlinearity, = ki • s, by the ratio 
of high to low contrast, C++/C+= 1.82. Second, we 
also rescaled the output firing rate by the ratios of the 
steady- state firing rates in both C+ and C++ conditions 
(the rates are not equal because the neurons do not adapt 
perfectly). After these two rescaling operations, the mea- 
sured nonlinearity for C++ (high contrast) stimulus lined 
up well with the rescaled nonlinearity from the C+ (low 
contrast) stimulus, indicating the ability of the neuron to 
adapt to contrast. This observation was generally true 
for most (19 out of 23) neurons in our dataset, as shown 
in Fig. [tJ^. The rescaling fails at very high firing rates, 
because they are not accessed in the low contrast condi- 
tion, and (potentially) because we were only looking at 
the marginal ID (and not the full 2D) nonlinearity. 

When the stimulus contrast changes, retinal ganglion 
cells adjust their gain, matching the variation of the sig- 
nal about the mean to the dynamic range of the fir- 
ing rate at the output, thereby keeping the informa- 
tion rate high. Without adaptation, the information 
rate would drop because the neurons have a limited 
output range and they are noisy. "Noise" in the con- 



text of LN encoding models is the stochasticity related 
to the spike generation: from the stimulus s to spike, 
s {vi^V2} J^{vi^V2) spike, it arises when the 
nonlinear function JV is interpreted as the mean firing 
rate of a Poisson point process. To explore the effects 
of the presence or absence of adaptation, we generated 
spikes according to this LN prescription in response to 
various stimuli, and measured the information in such 
synthetic spike trains using Eq. By using the true 
inferred (adapting) models for low and high contrast, we 
found that in both conditions the real spiking neuron 
can retain ^ 90% of the information extracted from the 
stimuli by the linear filters. On the other hand, when us- 
ing the model inferred at high contrast, holding it fixed 
(no adaptation), and probing it with stimuli of progres- 
sively lower contrast, the information rate dropped signif- 
icantly, as shown in Fig. [7^. The situation is very differ- 
ent for skewness (or kurtosis): Fig.jTp shows that no such 
drop in information is observed when the global model is 
used to generate spikes in case of skewed stimuli, making 
adaptation unnecessary and invariant encoding possible. 

This effect is easy to understand if we compare the 
extent to which the 9 stimulus distributions differ a 
priori, after filtering by the neuron's linear filters, and 
after passing through the nonlinear function. Fig- 
ure |8]A. shows a 9 X 9 matrix of the Kullback-Leibler 
distances DKL{Pi{s)\\Pj{s)) between all pairs of stim- 
uH ij = {C+,C++,S-,S-,S+,S++,K-,K-,K+} (2 Gaus- 
sian, 7 higher-order statistics). The bimodal stimulus K~ 
is clearly distinct from the others. After linear filtering 
(Fig. [8^), however, the low contrast stimulus C+ differs 
the most from the others, which are all matched in con- 
trast. Because linear filters, as we have shown, do not 
change in shape, this is simply a consequence of the cen- 
tral limit theorem: the filters sum up (with weights) sam- 
ples drawn independently from the stimulus distributions 
P5, so the filter outputs must converge to Gaussian dis- 
tributions with variances that are related to the variance 
(or contrast) of the input. In other words, the invariant 
linear filters remove the signatures of higher order statis- 
tics and "equalize" different stimuli with the exception 
of their contrast. In the last, nonlinear stage (Fig. [sjC), 
the nonlinearity adapts to contrast as well, ultimately 
yielding LN model outputs whose distributions are very 
similar across the range of stimuli differing in contrast, 
skewness and kurtosis. 



4. Should the linear filters adapt to changes in higher-order 
statistics? 

We previously established that the retinal ganglion 
cells do not adapt their linear filtering properties to 
changes in stimulus skewness and kurtosis. Taking this 
invariance of linear filters as given, we next showed that 
linear filtering and contrast adaptation together can suc- 
cessfully remove the signatures of higher-order statistics 
from the distribution of neural firing rates. Here we re- 
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FIG. 7 The benefits of contrast and higher-order statistics adaptation. A) Inset. The ID nonlinearity, jV{vi), for an 
example neuron (E0C4) inferred at high contrast (C++, hght blue), and at low contrast (C+, dark blue). The low contrast 
nonlinearity can be aligned to the high contrast one by (i) rescaling the stimulus (horizontal) axis by the ratio of the two 
contrasts, and (ii) rescaling the firing rate (vertical) axis by the ratio of the two average firing rates, yielding the red line. 
Main panel Scatter plot of the nonlinearity at high vs nonlinearity at low contrast (black, 19 neurons from experiment 1; 
the coordinates of each point are the high / low C nonlinearity values at the same value of the projection vi for a particular 
neuron). After rescaling, the nonlinearities align (red). The scaling breaks down for rates above 30 Hz (rarely observed at low 
contrast). B) The information in the spiking pattern of a LN model neuron, normalized by the information captured by the 
two linear projections of the corresponding stimulus. Cyan circles = inferred models for 19 neurons for high and low contrast 
(C++, C+) stimulus. Green line = computational prediction obtained by taking 19 high contrast models and dialing down 
the stimulus contrast without any adaptation in the model (error bars = std across the neurons). C) Analogous analysis for 
changes in skewness (note the difference in scale); magenta = models inferred separately for each skewed stimulus; green = 
invariant (and therefore non-adapting) global models for every cell. 
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FIG. 8 Neurons with contrast adaptation yield similar distributions of firing rates in response to very diff'erent 
distributions of inputs. A) KuUback-Leibler distance matrix, Dkl (bits), between all 9 pairs of stimulus distributions (cyan 
= 2 Gaussian, magenta = 4 skewed, yellow = 3 kurtotic distributions). Bimodal K~ distribution is most different from the 
others. B) The difference between the respective 2D linear projections of the 9 stimulus distributions (shown are the averages 
over Dkl matrices for 19 neurons in experiment 1). Linear filtering of IID stimuli washes out most of the higher-order structure 
(but not the second order), and the most distinct stimulus type at this stage is C+, since its variance is different from the other 
distributions of projections. C) Dkl (average over 19 neurons) between the nonlinear transformations of the respective linear 
projections. Since the nonlinearity adapts to contrast, this step equalizes the low contrast (C+) with the other stimuli. 



turn to the observation of invariant linear filters to ask cal grounds, 
whether such invariance should be expected on theoreti- 



In a one-dimensional LN model neuron, the probabil- 
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ity of spiking was assumed to be a saturating nonlinear 
function of the filtered stimulus: 



P(spike|s) = - tanli(k(5) • s + 0{S)) 



1 

2' 



(3) 



where the linear filter k and the spiking threshold 
may depend on the stimulus type S. We simulated 
spike sequences a of this model neuron in response to 
repeated presentations of the stimulus, whose value in 
each time bin was drawn independently from from Ps {L) : 
r) = {0, 1} was '0' when the neuron was silent in time 
bin t and during repeated presentation r of the stimulus, 
and '1' if it spiked. To quantify how well such a neuron 
encodes information about the stimulus into spike trains, 
we estimated the information rate / (in bits per second) 
between the sti muli a nd the response, using the direct 
method ( Strong et all [1998) . 

For stimulus types S of varying skewness, we found the 
optimal filter k and the threshold that would maximize 
the information rate / that the neuron would convey. If 
real neurons were adapting in such a way, this procedure 
would then predict how their filters would change with 
the stimulus distribution. For computational reasons, 
our model was simplified: (i) the linear filter was bipha- 
sic, with a "fast" lobe of amplitude A/, and a "slow" 
lobe of amplitude A^, whose widths and the positions 
were fixed, so the filter was fully specified by the two am- 
plitude parameters {As^Af} as schematized in Fig. [9|^; 
(ii) we maximized the information rate / for a fixed av- 
erage firing rate f; (iii) the effective noise of the neu- 
ron, or fraction of output entropy that is lost to noise, 
V = 5'noise/5'totai, was fixed. 

Figure [9] shows the dependence of the information rate 
on the shape of the stimulus filter: left-skew distributions 
(S— ) slightly favor OFF cells (negative fast lobe, positive 
slow lobe. Fig. [9^), while right-skew distributions (S++) 
slightly favor ON cells (not shown). This conclusion was 
robust to noise in the neuron 77 ranging from ~ to 0.5. 
For a specific choice of 77 = 0.2 in Fig. [9p, Fig. [9p shows 
the dependence of information rate on the ratio of the 
fast to slow lobe amplitude, tanc/) = Af/Ag. For each 
stimulus (S++ and S~), there are two local maxima, one 
for an ON-like and one for an OFF-like cell, which differ 
in the transmitted information by less than 10%. Im- 
portantly, however, the maxima are not achieved at the 
same value of (j) in both stimulus conditions - this means 
that while an adapting ON or OFF cell can maintain the 
same rate of information transmission when the stimu- 
lus changes, it will need to modify the filter shape by 
adjusting the ratio Af/Ag. 

Focusing on the case of an OFF cell, we found that an 
optimally adapting cell would increase the area under the 
fast (negative) lobe and would decrease the area under 
the slow (positive) lobe with increasing skewness. We 
used the balance ratio 6, introduced previously, to quan- 
tify this change. Figure [9^ shows significant changes in 
the filter shape with skew that range from b = 0.3 for S~ 
to b = —0.3 for S++ (as expected, the optimal filter for 



the Gaussian stimulus is balanced (6 = 0), with equal and 
opposite areas under the two lobes). However, these sub- 
stantial changes in the filter shape only lead to moderate 
changes in the amount of encoded information: in the 
case shown in Fig. [9j the information gain of the adap- 
tive neuron relative to the case of no adaptation is always 
less than 10%. While the exact number varies with the 
chosen constraints (f, 77, the locations and widths of the 
filter lobes), two qualitative observations remain true: (i) 
the neurons should adjust the biphasic linear filter away 
from the balanced configuration (optimal for Gaussian 
stimuli) by systematically adjusting the weight under the 
fast and slow lobes so that, e.g. in case of the OFF cell, 
negative skew favors larger weight in the slow lobe; but 
also that (ii) these adaptive changes would only lead to 
small relative information gains. 



IV. DISCUSSION 

We explored the limits of retinal adaptation, by ana- 
lyzing the responses of salamander retinal ganglion cells 
to temporally uncorrelated and spatially uniform stimuli 
with Gaussian, skewed and kurtotic luminance distribu- 
tions. While the retina is highly adaptive to changes in 
first and second order statistics, we found invariance of 
neural encoding to changes in stimulus skewness and kur- 
tosis in both the linear filt ering stage and the nonlinear 
stage. [Bonin et al. (2008) reported a similar result in 
the cat LGN for spatially structured stimuli, using ID 
LN models inferred using reverse correlation; we see an 
analogous invariance to skew and kurtosis for spatially 
uniform stimuli in the retina, for 2D models inferred us- 
ing a theoretically unbiased method, while exploring a 
substantially wider range of skewness and kurtosis val- 
ues. Taken together, these results suggest that adapta- 
tion in the early visual system is not sensitive to changes 
in commonly measured higher-order statistics. 

From an information optimization point of view it is 
not immediately clear why the retina does not adapt 
to such higher-order changes. Moreover, since neurons 
are nonlinear dynamical systems coupled to their inputs, 
a large change in the statistical structure of the input 
would, on general grounds, be expected to influence the 
neuron's effective encoding properties, even if it did not 
lead to an increase in information transmission. Thus, in- 
variant encoding in the face of substantial changes in the 
stimulus statistics is not an obvious a priori expectation. 

Adaptive code is necessarily ambiguous. The same re- 
sponse from an adapting neuron can, for example, signal 
two different light intensities, depending on the stimulus 
history. Downstream neurons must therefore rely either 
on keeping track of the adaptive state of the encoding 
neuron, or on using diversity in the neural population in 
a proper way to estimate the stimulus. Moreover, non- 
trivial processing is required also on the encoding side: 
the adaptive neuron needs to infer from the stimulus it- 
self whether some underlying property of the stimulus 
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FIG. 9 Finding optimal filters for an LN model neuron for stimuli with negative skew (S— ). A) The biphasic fiher 
with two parameters determining the amphtudes of the fast and slow lobes, {Af,As}. Each of the amplitudes can be positive 
or negative. When As is positive and Af is negative, the cell is an OFF cell; when As is negative and Af is positive, the cell 
is an ON cell. B) Information about the stimulus encoded in the spike train (bits per second, color scale), as a function of the 
fast lobe amplitude Af and the slow lobe amplitude As. The ON and OFF types have been denoted in the 2 corresponding 
quadrants of the plot. C) Fraction of entropy lost to noise, 77, as a function of {Af, As}. Points in the plane that have 77 ~ 0.2 
are shown in white, and lie on a circular locus of points; we are only interested in the models with fixed value of 77. We 
parametrize such points by their angle, 0, going counterclockwise from the vertical {Af — 0). D) Information on the locus of 
points ?7 ^ 0.2 as a function of 0; these values are extracted from B) along the 77 = 0.2 contour. Green points correspond to 
ON cells, cyan points to OFF cells. There are two peaks in information, one (slightly higher) peak for the OFF type cell and 
one for the ON type cell. In all plots, the optimal OFF cell is denoted by a magenta dot. E) Theoretical prediction for the 
shape of the optimal biphasic OFF filters for stimuli with different skewness values. As skewness increases from negative (S~, 
red) to positive (S++, blue), the negative lobe becomes more prominent and the positive lobe becomes less prominent. For 
the symmetric Gaussian stimulus (C++, green) the optimal filter is balanced. These changes are quantified by the balance 
index b (see text), which measures the difference in area between the lobes, normalized to the total absolute area under the 
filter. For the simulations in this figure, the stimulus refresh time is 10 ms and mean firing rate is held fixed at 5 Hz (results 
are qualitatively unchanged for rates up to fourfold higher). 



(such as the mean luminance or contrast) has changed 
and thus an adaptive r espon se needs to be triggered (but 
c.f. also |Borst et aL' (2005)). In short, adaptive codes 
incur computational costs that invariant codes do not. 

These considerations suggest that there exist changes 
in stimulus statistics for which the benefits of adapta- 
tion outweigh the costs. It also implies the existence of 
changes where the encoding properties should remain in- 
variant, even if other equally good encoding strategies 
were available to the cell, thereby making the code easy 
and unambiguous to decode. Such changes would be 
those where, even without adaptation, the information 
rate would not drop precipitously when the stimulus dis- 
tribution changes. 

Our results lead us to believe that an encoding model 
with non- adapting linear filter (s) and a nonlinearity, 
which adapts to contrast but not to higher-order statis- 
tics, is an efficient strategy that reduces the inherent am- 
biguity in the code. Contrast adaptation is necessary to 
prevent a severe drop in the information rate after a con- 
trast change. If in addition the linear filters are invariant, 
the filtering, together with the central limit theorem, can 
remove higher-order statistics. This would leave the lu- 
minance and the contrast as the only target statistics 
for adaptation, while simultaneously maintaining a high 
information rate and minimizing the code ambiguity. 

We did not observe any adaptive changes in the filter 
shape either for skewed and kurtotic stimuli, or after the 
change in contrast. Small c hanges in filter shape h ave 
been previously reported by Smirnakis et al. (1997) in 



response to large changes in contrast, beyond the values 
we used here ("faster" filters at high contrast values). It 
is not clear how one could assess the cost of adding an 
adaptive mechanism that modifies the retinal ganglion 
cell filters for higher-order statistics; we can ask, how- 
ever, how much information would be gained by it. Our 
toy model simulations show that while it would be theo- 
retically possible to increase the transmitted information 
by adjusting the shape of the linear filter to the stimulus, 
the gains in information would be very modest (around 
10% for a robust range of realistic parameter choices, 
much less than the gain due to contrast adaptation), de- 
spite significant changes in the shape of the optimal linear 
filter. 

One explanation for the lack of dynamic adaptation to 
higher-order statistics is therefore that it does not yield 
much gain in information, while potentially increasing 
the coding cost and complexity. It is possible, then, that 
instead of implementing an adaptation mechanism able 
to dynamically change the filter shape on an individual 
cell basis in response to stimulus skew, the neural pop- 
ulation is structurally adapted to the overall luminance 
distribution of natural scenes, by properly partitioning 



the population between ON- and OFF-like cells (Ratliff 
et all 120101). 



Another explanation for invariant coding in response to 
higher-order statistics could be that the ability to adapt 
is limited by the ability to detect a change in the un- 
derlying statistical structure of the stimulus. For exam- 
ple, to detect reliably a change in kurtosis, the neuron 
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would need to receive some minimal number of indepen- 
dent stimulus samples, but this minimum might still be 
large, making such inferences hard. More fundamentally, 
it is not clear what are the actual statistics that th e neu- 
rons are adapting to (Bonin et al.. 2008; Simmons et al. 



2009). While we commonly think in terms of contrast. 



skew, and kurtosis as the relevant statistical properties 
of stimuli, it is not obvious that the brain relies on these 
same measures in dealing with natural scenes. In par- 
ticular, they may be poor choices in natural settings, as 
their values are sensitive to outliers and becau se they 
might vary in a dependent way in nature (c.f. iMante 



et al. ( 2005| )). Perhaps then, the retina (and other neu- 
ral systems) may be using other estimators for contrast-, 
skew-, and kurtosis-like statistics? 

Some of these important issues could be addressed by 
extending our analysis either by using temporally cor- 
related stimuli, or by heavy-tailed (or naturalistic) lu- 
minance histograms - if they can be reproduced in the 
l ab using display hard ware with a larger dynamic range 
(Rieke & Rudd, |2009 ). Both of these extensions would 
affect the central limit theorem argument above: in the 
case of temporally correlated stimulus, the samples would 
no longer be independently drawn and thus might not 
(quickly) converge to a Gaussian; in the second case, the 
linear filter will not be able to "erase" the signatures of 
higher-order statistics, should the decay of the luminance 
distribution be too shallow. As both extensions would 
bring the stimuli closer to the true naturalistic ones, they 
could provide us with a more complete window into the 
nature of retinal adaptation to natural scenes. 
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